Skip to main content

Adaptive Hierarchical Motion-Focused Model for Video Prediction

  • Conference paper
  • First Online:
Advances in Multimedia Information Processing – PCM 2018 (PCM 2018)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 11164))

Included in the following conference series:

  • 3667 Accesses

Abstract

Video prediction is a promising task in computer vision for many real-world applications and worth exploring. Most existing methods generate new frames based on appearance features with few constrain, which results in blurry predictions. Recently, some motion-focused methods are proposed to alleviate the problem. However, it’s difficult to capture the object motions from a video sequence and apply the learned motions to appearance, due to variety and complexity of real-world motions. In this paper, an adaptive hierarchical motion-focused model is introduced to predict realistic future frames. This model takes advantage of hierarchical motion modeling and adaptive transformation strategy, which can achieve better motion understanding and applying. We train our model end to end and employ the popular adversarial training to improve the quality of generations. Experiments on two challenging datasets: Penn Action and UCF101, demonstrate that the proposed model is effective and competitive with outstanding approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Mathieu, M., Couprie, C., Lecun, Y.: Deep multi-scale video prediction beyond mean square error. arXiv preprint arXiv:1511.05440 (2015)

  2. Vondrick, C., Pirsiavash, H., Torralba, A.: Generating videos with scene dynamics. In: Advances in Neural Information Processing Systems (NIPS), Barcelona, pp. 613–621 (2016)

    Google Scholar 

  3. Villegas, R., Yang, J., Hong, S., et al.: Decomposing motion and content for natural video sequence prediction. arXiv preprint arXiv:1706.08033 (2017)

  4. Lu, C., Hirsch, M., Scholkopf, B.: Flexible spatio-temporal networks for video prediction. In: Conference on Vision and Pattern Recognition (CVPR) (2017)

    Google Scholar 

  5. Finn, C., Goodfellow, I., Levine, S.: Unsupervised learning for physical interaction through video prediction. In: Advances in Neural Information Processing Systems (NIPS), Barcelona (2016)

    Google Scholar 

  6. Liu, Z., Yeh, R. A., Tang, X., et al.: Video frame synthesis using deep voxel flow. In: International Conference on Computer Vision (ICCV) (2017)

    Google Scholar 

  7. Chen, X., Wang, W., Wang, J., et al.: Learning object-centric transformation for video prediction. In: Proceedings of the 2017 ACM on Multimedia Conference, pp. 1503–1512 (2017)

    Google Scholar 

  8. Villegas, R., Yang, J., Zou, Y., et al.: Learning to generate long-term future via hierarchical prediction. arXiv preprint arXiv:1704.05831 (2017)

  9. Jia, X., De Brabandere, B., Tuytelaars, T., et al.: Dynamic filter networks. In: Advances in Neural Information Processing Systems (NIPS), Barcelon (2016)

    Google Scholar 

  10. Van Amersfoort, J., Kannan, A., Ranzato, M.A., et al.: Transformation-based models of video sequences. arXiv preprint arXiv:1701.08435 (2017)

  11. Vondrick, C., Torralba, A.: Generating the future with adversarial transformers. In: Conference on Vision and Pattern Recognition (CVPR) (2017)

    Google Scholar 

  12. Xue, T., Wu, J., Bouman, K., et al.: Visual dynamics: probabilistic future frame synthesis via cross convolutional networks. In: Advances in Neural Information Processing Systems (NIPS), Barcelona, pp. 91–99 (2016)

    Google Scholar 

  13. Song, Y., Viventi, J., Wang, Y.: Multi resolution LSTM for long term prediction in neural activity video. arXiv preprint arXiv:1705.02893 (2017)

  14. Dai, J., Qi, H., Xiong, Y., et al.: Deformable convolutional networks. In: Conference on Vision and Pattern Recognition (CVPR) (2017)

    Google Scholar 

  15. Goodfellow, I., Pouget-Abadie, J., Mirza, M., et al.: Generative adversarial nets. In: Advances in Neural Information Processing Systems (NIPS), pp. 2672–2680 (2014)

    Google Scholar 

  16. Zhang, W., Zhu, M., Derpanis, K.G.: From actemes to action: a strongly-supervised representation for detailed action understanding. In: International Conference on Computer Vision (ICCV) (2013)

    Google Scholar 

  17. Soomro, K., Zamir, A. R., Shah, M.: UCF101: a dataset of 101 human actions classes from videos in the wild. arXiv preprint arXiv:1212.0402 (2012)

  18. Wang, Z., Bovik, A.C., Sheikh, H.R., et al.: Image quality assessment: from error visibility to structural similarity. IEEE Trans. Image Process. 13(4), 600–612 (2004)

    Article  Google Scholar 

  19. Liang, X., Lee, L., Dai, W., et al.: Dual motion GAN for future-flow embedded video prediction. In: International Conference on Computer Vision (ICCV) (2017)

    Google Scholar 

  20. Byeon, W., Wang, Q., Srivastava, R.K., et al.: Fully context-aware video prediction. arXiv preprint arXiv:1710.08518 (2017)

Download references

Acknowledgement

This work is supported by Shenzhen Peacock Plan (20130408-183003656), Shenzhen Key Laboratory for Intelligent Multimedia and Virtual Reality (ZDSYS201703031405467), and National Natural Science Foundation of China (NSFC, No.U1613209).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Wenmin Wang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Tang, M., Wang, W., Chen, X., He, Y. (2018). Adaptive Hierarchical Motion-Focused Model for Video Prediction. In: Hong, R., Cheng, WH., Yamasaki, T., Wang, M., Ngo, CW. (eds) Advances in Multimedia Information Processing – PCM 2018. PCM 2018. Lecture Notes in Computer Science(), vol 11164. Springer, Cham. https://doi.org/10.1007/978-3-030-00776-8_53

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-00776-8_53

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-00775-1

  • Online ISBN: 978-3-030-00776-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics